In this notebook, we work on a dataset from IMDB site to classify movie reviews into “positive” reviews and “negative” reviews, just based on the text content of the reviews.

library(readr)
library(tidyr)
library(tibble)
library(plotly)

The IMDB dataset

The IMDB dataset , a set of 50,000 highly-polarized reviews from the Internet Movie Database. They are split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting in 50% negative and 50% positive reviews. The IMDB dataset comes packaged with Keras. It has already been preprocessed: the reviews (sequences of words) have been turned into sequences of integers, where each integer stands for a specific word in a dictionary.

Loading the dataset

library(keras)
imdb <- dataset_imdb(num_words = 10000)
c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb

The argument num_words = 10000 means that we will only keep the top 10,000 most frequently occurring words in the training data. Rare words will be discarded. This allows us to work with vector data of manageable size.

The variables train_data and test_data are lists of reviews, each review being a list of word indices (encoding a sequence of words). train_labels and test_labels are lists of 0s and 1s, where 0 stands for “negative” and 1 stands for “positive”:

train_labels[[1]]
[1] 1

Top 10,000 most frequent words are considered , no word index will exceed 10,000:

max(sapply(train_data, max))
[1] 9999

Data Preparation

vectorize_sequences <- function(sequences, dimension = 10000) {
  # Creating all-zero matrix of shape (len(sequences), dimension)
  results <- matrix(0, nrow = length(sequences), ncol = dimension)
  for (i in 1:length(sequences))
    # Setting specific indices of results[i] to 1s
    results[i, sequences[[i]]] <- 1
  results
}
# vectorize training data
x_train <- vectorize_sequences(train_data)
# vectorize test data
x_test <- vectorize_sequences(test_data)
#vectorize labels
y_train <- as.numeric(train_labels)
y_test <- as.numeric(test_labels)

Data type of sample is now changed to numeric .

str(x_train[1,])
 num [1:10000] 1 1 0 1 1 1 1 1 1 0 ...

Building the Neural network

The choice of network for the first model of three layers is the ‘relu’ activation function layer_dense(units = 16, activation = "relu"). Each dense layer with a relu activation implements the following chain of tensor operations: output = relu(dot(W, input) + b)

Having 16 hidden units means that the weight matrix W will have shape (input_dimension, 16), i.e. the dot product with W will project the input data onto a 16-dimensional representation space (and then we would add the bias vector b and apply the relu operation).

library(keras)
firstmodel_3layer <- keras_model_sequential() %>% 
  layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>% 
  layer_dense(units = 16, activation = "relu") %>% 
  layer_dense(units = 1, activation = "sigmoid")

Configure the model - optimiser selection

Lastly, we need to pick a loss function and an optimizer. crossentropy is usually the best choice when you are dealing with models that output probabilities and for binary classification probems. Crossentropy is a quantity from the field of Information Theory, that measures the “distance” between probability distributions, or in our case, between the ground-truth distribution and our predictions.

configuring our model with the rmsprop optimizer and the binary_crossentropy loss function.

firstmodel_3layer %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("accuracy")
)

Validating our approach

In order to monitor during training the accuracy of the model on data that it has never seen before, we will create a “validation set” by setting apart 10,000 samples from the original training data:

val_indices <- 1:10000
x_val <- x_train[val_indices,]
partial_x_train <- x_train[-val_indices,]
y_val <- y_train[val_indices]
partial_y_train <- y_train[-val_indices]

Training our model for 20 epochs (20 iterations over all samples in the x_train and y_train tensors), in mini-batches of 512 samples. At this same time we will monitor loss and accuracy on the 10,000 samples that we set apart. This is done by passing the validation data as the validation_data argument:

firstmodel_3layer %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("accuracy")
)
history <- firstmodel_3layer %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val)
)
Train on 15000 samples, validate on 10000 samples
Epoch 1/20
2019-02-21 06:27:41.729525: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

  512/15000 [>.............................] - ETA: 12s - loss: 0.6949 - acc: 0.4746
 1536/15000 [==>...........................] - ETA: 4s - loss: 0.6853 - acc: 0.5234 
 3072/15000 [=====>........................] - ETA: 2s - loss: 0.6526 - acc: 0.6322
 4608/15000 [========>.....................] - ETA: 1s - loss: 0.6321 - acc: 0.6554
 6144/15000 [===========>..................] - ETA: 0s - loss: 0.6038 - acc: 0.6961
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.5787 - acc: 0.7266
 9216/15000 [=================>............] - ETA: 0s - loss: 0.5603 - acc: 0.7453
10752/15000 [====================>.........] - ETA: 0s - loss: 0.5456 - acc: 0.7568
12800/15000 [========================>.....] - ETA: 0s - loss: 0.5271 - acc: 0.7703
14336/15000 [===========================>..] - ETA: 0s - loss: 0.5136 - acc: 0.7791
15000/15000 [==============================] - 1s 80us/step - loss: 0.5075 - acc: 0.7833 - val_loss: 0.3810 - val_acc: 0.8669
Epoch 2/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.3238 - acc: 0.8984
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.3304 - acc: 0.9067
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.3188 - acc: 0.9093
 5120/15000 [=========>....................] - ETA: 0s - loss: 0.3096 - acc: 0.9119
 6656/15000 [============>.................] - ETA: 0s - loss: 0.3070 - acc: 0.9093
 8704/15000 [================>.............] - ETA: 0s - loss: 0.3111 - acc: 0.9023
10752/15000 [====================>.........] - ETA: 0s - loss: 0.3074 - acc: 0.9046
12288/15000 [=======================>......] - ETA: 0s - loss: 0.3048 - acc: 0.9031
14336/15000 [===========================>..] - ETA: 0s - loss: 0.3016 - acc: 0.9028
15000/15000 [==============================] - 1s 45us/step - loss: 0.2998 - acc: 0.9036 - val_loss: 0.2996 - val_acc: 0.8905
Epoch 3/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.2107 - acc: 0.9414
 2560/15000 [====>.........................] - ETA: 0s - loss: 0.2188 - acc: 0.9367
 4608/15000 [========>.....................] - ETA: 0s - loss: 0.2205 - acc: 0.9362
 6656/15000 [============>.................] - ETA: 0s - loss: 0.2189 - acc: 0.9354
 8704/15000 [================>.............] - ETA: 0s - loss: 0.2199 - acc: 0.9305
10240/15000 [===================>..........] - ETA: 0s - loss: 0.2192 - acc: 0.9300
11264/15000 [=====================>........] - ETA: 0s - loss: 0.2196 - acc: 0.9293
12800/15000 [========================>.....] - ETA: 0s - loss: 0.2185 - acc: 0.9290
14336/15000 [===========================>..] - ETA: 0s - loss: 0.2168 - acc: 0.9293
15000/15000 [==============================] - 1s 46us/step - loss: 0.2164 - acc: 0.9295 - val_loss: 0.2989 - val_acc: 0.8805
Epoch 4/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.1833 - acc: 0.9395
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.1834 - acc: 0.9468
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.1760 - acc: 0.9481
 5120/15000 [=========>....................] - ETA: 0s - loss: 0.1759 - acc: 0.9475
 6656/15000 [============>.................] - ETA: 0s - loss: 0.1745 - acc: 0.9473
 8192/15000 [===============>..............] - ETA: 0s - loss: 0.1726 - acc: 0.9479
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.1709 - acc: 0.9473
11264/15000 [=====================>........] - ETA: 0s - loss: 0.1731 - acc: 0.9450
12800/15000 [========================>.....] - ETA: 0s - loss: 0.1757 - acc: 0.9430
14848/15000 [============================>.] - ETA: 0s - loss: 0.1742 - acc: 0.9427
15000/15000 [==============================] - 1s 47us/step - loss: 0.1740 - acc: 0.9427 - val_loss: 0.2791 - val_acc: 0.8877
Epoch 5/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.1268 - acc: 0.9668
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.1358 - acc: 0.9644
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.1334 - acc: 0.9651
 5120/15000 [=========>....................] - ETA: 0s - loss: 0.1326 - acc: 0.9631
 6656/15000 [============>.................] - ETA: 0s - loss: 0.1400 - acc: 0.9576
 8192/15000 [===============>..............] - ETA: 0s - loss: 0.1383 - acc: 0.9572
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.1380 - acc: 0.9570
11776/15000 [======================>.......] - ETA: 0s - loss: 0.1398 - acc: 0.9558
13824/15000 [==========================>...] - ETA: 0s - loss: 0.1416 - acc: 0.9538
15000/15000 [==============================] - 1s 44us/step - loss: 0.1416 - acc: 0.9539 - val_loss: 0.2831 - val_acc: 0.8873
Epoch 6/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.1216 - acc: 0.9707
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.1099 - acc: 0.9746
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.1067 - acc: 0.9727
 5120/15000 [=========>....................] - ETA: 0s - loss: 0.1149 - acc: 0.9678
 7168/15000 [=============>................] - ETA: 0s - loss: 0.1145 - acc: 0.9671
 9216/15000 [=================>............] - ETA: 0s - loss: 0.1151 - acc: 0.9652
10752/15000 [====================>.........] - ETA: 0s - loss: 0.1145 - acc: 0.9654
12800/15000 [========================>.....] - ETA: 0s - loss: 0.1144 - acc: 0.9653
14336/15000 [===========================>..] - ETA: 0s - loss: 0.1142 - acc: 0.9654
15000/15000 [==============================] - 1s 49us/step - loss: 0.1143 - acc: 0.9654 - val_loss: 0.3111 - val_acc: 0.8814
Epoch 7/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.1033 - acc: 0.9707
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0962 - acc: 0.9727
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.0935 - acc: 0.9741
 5120/15000 [=========>....................] - ETA: 0s - loss: 0.0914 - acc: 0.9754
 7168/15000 [=============>................] - ETA: 0s - loss: 0.0939 - acc: 0.9735
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0950 - acc: 0.9730
10752/15000 [====================>.........] - ETA: 0s - loss: 0.0965 - acc: 0.9722
12288/15000 [=======================>......] - ETA: 0s - loss: 0.0966 - acc: 0.9718
14336/15000 [===========================>..] - ETA: 0s - loss: 0.0970 - acc: 0.9715
15000/15000 [==============================] - 1s 44us/step - loss: 0.0971 - acc: 0.9715 - val_loss: 0.3132 - val_acc: 0.8833
Epoch 8/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0741 - acc: 0.9824
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0696 - acc: 0.9824
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.0714 - acc: 0.9805
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0740 - acc: 0.9790
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0738 - acc: 0.9794
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.0780 - acc: 0.9775
11264/15000 [=====================>........] - ETA: 0s - loss: 0.0795 - acc: 0.9767
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0797 - acc: 0.9765
14848/15000 [============================>.] - ETA: 0s - loss: 0.0796 - acc: 0.9769
15000/15000 [==============================] - 1s 44us/step - loss: 0.0806 - acc: 0.9765 - val_loss: 0.3841 - val_acc: 0.8677
Epoch 9/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0817 - acc: 0.9766
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0625 - acc: 0.9844
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0591 - acc: 0.9871
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0616 - acc: 0.9853
 7168/15000 [=============>................] - ETA: 0s - loss: 0.0607 - acc: 0.9855
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0627 - acc: 0.9848
10752/15000 [====================>.........] - ETA: 0s - loss: 0.0672 - acc: 0.9821
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0672 - acc: 0.9822
14848/15000 [============================>.] - ETA: 0s - loss: 0.0660 - acc: 0.9824
15000/15000 [==============================] - 1s 43us/step - loss: 0.0663 - acc: 0.9821 - val_loss: 0.3653 - val_acc: 0.8755
Epoch 10/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0536 - acc: 0.9922
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0509 - acc: 0.9917
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0521 - acc: 0.9900
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0517 - acc: 0.9893
 7168/15000 [=============>................] - ETA: 0s - loss: 0.0511 - acc: 0.9897
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0514 - acc: 0.9889
11264/15000 [=====================>........] - ETA: 0s - loss: 0.0546 - acc: 0.9869
13312/15000 [=========================>....] - ETA: 0s - loss: 0.0561 - acc: 0.9855
14848/15000 [============================>.] - ETA: 0s - loss: 0.0553 - acc: 0.9856
15000/15000 [==============================] - 1s 43us/step - loss: 0.0557 - acc: 0.9853 - val_loss: 0.3847 - val_acc: 0.8774
Epoch 11/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0406 - acc: 0.9941
 2560/15000 [====>.........................] - ETA: 0s - loss: 0.0364 - acc: 0.9949
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0347 - acc: 0.9951
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0362 - acc: 0.9936
 7168/15000 [=============>................] - ETA: 0s - loss: 0.0367 - acc: 0.9934
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0446 - acc: 0.9893
10752/15000 [====================>.........] - ETA: 0s - loss: 0.0444 - acc: 0.9892
12288/15000 [=======================>......] - ETA: 0s - loss: 0.0451 - acc: 0.9887
13824/15000 [==========================>...] - ETA: 0s - loss: 0.0448 - acc: 0.9890
15000/15000 [==============================] - 1s 43us/step - loss: 0.0454 - acc: 0.9886 - val_loss: 0.4164 - val_acc: 0.8760
Epoch 12/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0291 - acc: 0.9961
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0273 - acc: 0.9980
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.0280 - acc: 0.9975
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0318 - acc: 0.9952
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0316 - acc: 0.9956
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.0320 - acc: 0.9952
11264/15000 [=====================>........] - ETA: 0s - loss: 0.0322 - acc: 0.9949
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0324 - acc: 0.9946
14848/15000 [============================>.] - ETA: 0s - loss: 0.0383 - acc: 0.9914
15000/15000 [==============================] - 1s 43us/step - loss: 0.0384 - acc: 0.9912 - val_loss: 0.4493 - val_acc: 0.8693
Epoch 13/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0238 - acc: 0.9980
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0262 - acc: 0.9966
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.0247 - acc: 0.9972
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0230 - acc: 0.9975
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0246 - acc: 0.9964
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0248 - acc: 0.9963
11264/15000 [=====================>........] - ETA: 0s - loss: 0.0253 - acc: 0.9966
13312/15000 [=========================>....] - ETA: 0s - loss: 0.0257 - acc: 0.9963
14848/15000 [============================>.] - ETA: 0s - loss: 0.0281 - acc: 0.9952
15000/15000 [==============================] - 1s 43us/step - loss: 0.0281 - acc: 0.9952 - val_loss: 0.4824 - val_acc: 0.8699
Epoch 14/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0186 - acc: 0.9980
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0183 - acc: 0.9980
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0185 - acc: 0.9978
 6144/15000 [===========>..................] - ETA: 0s - loss: 0.0188 - acc: 0.9976
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0188 - acc: 0.9978
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.0219 - acc: 0.9962
11776/15000 [======================>.......] - ETA: 0s - loss: 0.0257 - acc: 0.9943
13312/15000 [=========================>....] - ETA: 0s - loss: 0.0252 - acc: 0.9944
14848/15000 [============================>.] - ETA: 0s - loss: 0.0250 - acc: 0.9948
15000/15000 [==============================] - 1s 44us/step - loss: 0.0249 - acc: 0.9949 - val_loss: 0.5048 - val_acc: 0.8699
Epoch 15/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0106 - acc: 1.0000
 2560/15000 [====>.........................] - ETA: 0s - loss: 0.0142 - acc: 0.9984
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0147 - acc: 0.9988
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0151 - acc: 0.9988
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0158 - acc: 0.9984
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0167 - acc: 0.9982
11264/15000 [=====================>........] - ETA: 0s - loss: 0.0173 - acc: 0.9980
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0172 - acc: 0.9982
14336/15000 [===========================>..] - ETA: 0s - loss: 0.0173 - acc: 0.9981
15000/15000 [==============================] - 1s 57us/step - loss: 0.0175 - acc: 0.9980 - val_loss: 0.5941 - val_acc: 0.8592
Epoch 16/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0280 - acc: 0.9961
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0166 - acc: 0.9976
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0144 - acc: 0.9988
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0135 - acc: 0.9991
 7168/15000 [=============>................] - ETA: 0s - loss: 0.0131 - acc: 0.9990
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0133 - acc: 0.9989
10752/15000 [====================>.........] - ETA: 0s - loss: 0.0170 - acc: 0.9974
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0177 - acc: 0.9970
14848/15000 [============================>.] - ETA: 0s - loss: 0.0169 - acc: 0.9973
15000/15000 [==============================] - 1s 43us/step - loss: 0.0168 - acc: 0.9973 - val_loss: 0.5711 - val_acc: 0.8694
Epoch 17/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0104 - acc: 1.0000
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0088 - acc: 1.0000
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0087 - acc: 0.9995
 6144/15000 [===========>..................] - ETA: 0s - loss: 0.0086 - acc: 0.9997
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0085 - acc: 0.9997
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.0093 - acc: 0.9994
11264/15000 [=====================>........] - ETA: 0s - loss: 0.0130 - acc: 0.9983
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0142 - acc: 0.9980
14848/15000 [============================>.] - ETA: 0s - loss: 0.0137 - acc: 0.9980
15000/15000 [==============================] - 1s 44us/step - loss: 0.0137 - acc: 0.9981 - val_loss: 0.6002 - val_acc: 0.8674
Epoch 18/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0083 - acc: 1.0000
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0071 - acc: 1.0000
 4096/15000 [=======>......................] - ETA: 0s - loss: 0.0069 - acc: 1.0000
 6144/15000 [===========>..................] - ETA: 0s - loss: 0.0067 - acc: 1.0000
 7680/15000 [==============>...............] - ETA: 0s - loss: 0.0066 - acc: 1.0000
 9728/15000 [==================>...........] - ETA: 0s - loss: 0.0065 - acc: 1.0000
11776/15000 [======================>.......] - ETA: 0s - loss: 0.0075 - acc: 0.9997
13824/15000 [==========================>...] - ETA: 0s - loss: 0.0095 - acc: 0.9989
15000/15000 [==============================] - 1s 41us/step - loss: 0.0095 - acc: 0.9990 - val_loss: 0.6311 - val_acc: 0.8670
Epoch 19/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0057 - acc: 1.0000
 2560/15000 [====>.........................] - ETA: 0s - loss: 0.0054 - acc: 1.0000
 4608/15000 [========>.....................] - ETA: 0s - loss: 0.0061 - acc: 0.9998
 6656/15000 [============>.................] - ETA: 0s - loss: 0.0057 - acc: 0.9998
 8704/15000 [================>.............] - ETA: 0s - loss: 0.0056 - acc: 0.9998
10752/15000 [====================>.........] - ETA: 0s - loss: 0.0062 - acc: 0.9998
12800/15000 [========================>.....] - ETA: 0s - loss: 0.0085 - acc: 0.9988
14848/15000 [============================>.] - ETA: 0s - loss: 0.0083 - acc: 0.9990
15000/15000 [==============================] - 1s 43us/step - loss: 0.0083 - acc: 0.9990 - val_loss: 0.6688 - val_acc: 0.8667
Epoch 20/20

  512/15000 [>.............................] - ETA: 0s - loss: 0.0045 - acc: 1.0000
 2048/15000 [===>..........................] - ETA: 0s - loss: 0.0044 - acc: 1.0000
 3584/15000 [======>.......................] - ETA: 0s - loss: 0.0045 - acc: 1.0000
 5632/15000 [==========>...................] - ETA: 0s - loss: 0.0042 - acc: 1.0000
 7168/15000 [=============>................] - ETA: 0s - loss: 0.0041 - acc: 1.0000
 9216/15000 [=================>............] - ETA: 0s - loss: 0.0045 - acc: 0.9999
10752/15000 [====================>.........] - ETA: 0s - loss: 0.0047 - acc: 0.9999
12288/15000 [=======================>......] - ETA: 0s - loss: 0.0052 - acc: 0.9998
14336/15000 [===========================>..] - ETA: 0s - loss: 0.0102 - acc: 0.9976
15000/15000 [==============================] - 1s 42us/step - loss: 0.0100 - acc: 0.9977 - val_loss: 0.6962 - val_acc: 0.8653

The call to fit() returns a history object. Let’s take a look at it:

str(history)
List of 2
 $ params :List of 8
  ..$ metrics           : chr [1:4] "loss" "acc" "val_loss" "val_acc"
  ..$ epochs            : int 20
  ..$ steps             : NULL
  ..$ do_validation     : logi TRUE
  ..$ samples           : int 15000
  ..$ batch_size        : int 512
  ..$ verbose           : int 1
  ..$ validation_samples: int 10000
 $ metrics:List of 4
  ..$ acc     : num [1:20] 0.783 0.904 0.929 0.943 0.954 ...
  ..$ loss    : num [1:20] 0.507 0.3 0.216 0.174 0.142 ...
  ..$ val_acc : num [1:20] 0.867 0.891 0.881 0.888 0.887 ...
  ..$ val_loss: num [1:20] 0.381 0.3 0.299 0.279 0.283 ...
 - attr(*, "class")= chr "keras_training_history"

Second model

the Second model is for four layers and The number of hidden units of the layer is 32

library(keras)
Secondmodel_4layer <- keras_model_sequential() %>% 
  layer_dense(units = 32, activation = "relu", input_shape = c(10000)) %>% 
  layer_dense(units = 32, activation = "relu") %>% 
  layer_dense(units = 32, activation = "relu") %>% 
  layer_dense(units = 1, activation = "sigmoid")

Configure the model

Validation

val_indices <- 1:10000
x_val2 <- x_train[val_indices,]
partial_x_train2 <- x_train[-val_indices,]
y_val2 <- y_train[val_indices]
partial_y_train2 <- y_train[-val_indices]

We will now train our model for 20 epochs ,batches = 512 samples,Loss function =MSE

Secondmodel_4layer %>% compile(
  optimizer = "rmsprop",
  loss = "mse",
  metrics = c("accuracy")
)
Secondmodelhistory <- Secondmodel_4layer %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val2, y_val2)
)

Comparing the two models by Plotting

compare_cx <- data.frame(
  firstmodel_3layer_train = history$metrics$loss,
  firstmodel_3layer_val = history$metrics$val_loss,
Secondmodel_4layer_train= Secondmodelhistory$metrics$loss,
Secondmodel_val = Secondmodelhistory$metrics$val_loss
) %>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)
  
p <- plot_ly(compare_cx,
             x = ~rowname,
             y = ~value,
             color = ~type,
             type = "scatter",
             mode = "lines") %>% 
  layout(title = "<b>Fig 1</b> Comparing model losses",
         xaxis = list(title = "Epochs"),
         yaxis = list(title = "Loss"))
p

Results of both models on validation set :

results2 <- Secondmodel_4layer %>% evaluate(x_val2, y_val2)

   32/10000 [..............................] - ETA: 0s
 1312/10000 [==>...........................] - ETA: 0s
 2528/10000 [======>.......................] - ETA: 0s
 3968/10000 [==========>...................] - ETA: 0s
 5440/10000 [===============>..............] - ETA: 0s
 6912/10000 [===================>..........] - ETA: 0s
 8384/10000 [========================>.....] - ETA: 0s
 9856/10000 [============================>.] - ETA: 0s
10000/10000 [==============================] - 0s 36us/step
results <- firstmodel_3layer %>% evaluate(x_val, y_val)

   32/10000 [..............................] - ETA: 0s
 1792/10000 [====>.........................] - ETA: 0s
 3520/10000 [=========>....................] - ETA: 0s
 5344/10000 [===============>..............] - ETA: 0s
 7232/10000 [====================>.........] - ETA: 0s
 9120/10000 [==========================>...] - ETA: 0s
10000/10000 [==============================] - 0s 28us/step
results
$loss
[1] 0.6961613

$acc
[1] 0.8653
results2
$loss
[1] 0.1187518

$acc
[1] 0.8656

the accuracy of both models on validation set is nearly same, 86.5% and 86.6% ##Performance on test set of both models:

Secondmodel_4layer %>% fit(x_train, y_train, epochs = 20, batch_size = 512)
resultstest2 <- Secondmodel_4layer %>% evaluate(x_test, y_test)
resultstest2
firstmodel_3layer %>% fit(x_train, y_train, epochs = 20, batch_size = 512)
resultstest <- firstmodel_3layer %>% evaluate(x_test, y_test)
resultstest

The Accuracy in same for the test set is only 84.9% for first model and lower for sencond model 83.6% , this is because of overfitting of data while training , this could be taken care of with the help of regularization tecniques.Regularization is the process of modulating the quantity of information that a model is allowed to store or to add constraints on what information it’s allowed to store.

Now, we will use two techniques and Dropout

1.Regularization

Reducing networks size: The more capacity the network has, the more quickly it can model the training data, but the more susceptible it is to overfitting.Applying Weight Regularization with L2: The cost added to the loss function is proportional to the absolute value of the weight coefficients

2 layer network , units = 16 ,loss = “binary_crossentropy”

model3 <- 
  keras_model_sequential() %>%
  layer_dense(units = 16, activation = "relu", input_shape = c(10000),
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dense(units = 16, activation = "relu",
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dense(units = 1, activation = "sigmoid")
model3 %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = list("accuracy")
)
model3 %>% summary()

Validation

model3history <-model3 %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val),
  verbose = 2
)

Results :

results3 <- model3 %>% evaluate(x_val, y_val)

   32/10000 [..............................] - ETA: 0s
 1440/10000 [===>..........................] - ETA: 0s
 2816/10000 [=======>......................] - ETA: 0s
 4352/10000 [============>.................] - ETA: 0s
 5952/10000 [================>.............] - ETA: 0s
 7360/10000 [=====================>........] - ETA: 0s
 8864/10000 [=========================>....] - ETA: 0s
10000/10000 [==============================] - 0s 34us/step
results3
$loss
[1] 0.5141508

$acc
[1] 0.8657

1.1 Regularization

Applying Weight Regularization with L1: The cost added to the loss function is proportional to the absolute value of the weight coefficients

model with regularizer_l1_l2 , loss function mse, activation = softmax

2.Dropout

Dropout is one of the most effective and most commonly used regularization techniques for neural networks. The dropout rate is the fraction of the features that are zeroed out

dropout_model4 <- 
  keras_model_sequential() %>%
  layer_dense(units = 64, activation = "softmax", input_shape = c(10000)) %>%
  layer_dropout(0.6) %>%
  layer_dense(units = 16, activation = "softmax") %>%
  layer_dropout(0.6) %>%
  layer_dense(units = 1, activation = "sigmoid")
dropout_model4 %>% compile(
  optimizer = "adam",
  loss = "mse",
  metrics = list("accuracy")
)
dropout_model4 %>% summary()

Validation -model

dropout_history4 <- dropout_model4 %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val),
  verbose = 2
)

Results :

results4 <- dropout_model4 %>% evaluate(x_val, y_val)

   32/10000 [..............................] - ETA: 0s
  896/10000 [=>............................] - ETA: 0s
 1728/10000 [====>.........................] - ETA: 0s
 2656/10000 [======>.......................] - ETA: 0s
 3424/10000 [=========>....................] - ETA: 0s
 4352/10000 [============>.................] - ETA: 0s
 5312/10000 [==============>...............] - ETA: 0s
 6240/10000 [=================>............] - ETA: 0s
 7168/10000 [====================>.........] - ETA: 0s
 8128/10000 [=======================>......] - ETA: 0s
 9024/10000 [==========================>...] - ETA: 0s
 9920/10000 [============================>.] - ETA: 0s
10000/10000 [==============================] - 1s 57us/step
results4
$loss
[1] 0.1686238

$acc
[1] 0.8902

The accuracy on validation data is the best achieved with Dropout method Optimiser = ‘adam’ , activation = relu, acc=88.45 % and loss=0.4157 Optimiser = ‘rmsprop’ activation = relu, acc=87.4% and loss=0.6751 Optimiser = ‘adam’ , activation = Tanh, acc=87.25% and loss=0.5243 Optimiser = ‘adam’ , activation = relu, loss function = mse ,units = 32 ,acc=88.59% and loss=0.0951 Optimiser = ‘adam’ , activation = relu, loss function = mse ,units = 64 ,acc=88.59% and loss=0.0951 Optimiser = ‘adam’ , activation = softmax, loss function = mse ,units = 64 ,acc=89.02% and loss=0.1686

This last modification was most successful with highest accuracy

let’s check the model performance on test set

dropout_model4 %>% fit(x_train, y_train, epochs = 4, batch_size = 512)
Epoch 1/4

  512/25000 [..............................] - ETA: 2s - loss: 0.1880 - acc: 0.7285
 1024/25000 [>.............................] - ETA: 2s - loss: 0.1930 - acc: 0.7148
 2048/25000 [=>............................] - ETA: 2s - loss: 0.1953 - acc: 0.7075
 3072/25000 [==>...........................] - ETA: 1s - loss: 0.1953 - acc: 0.7035
 4096/25000 [===>..........................] - ETA: 1s - loss: 0.1950 - acc: 0.7046
 5120/25000 [=====>........................] - ETA: 1s - loss: 0.1949 - acc: 0.7045
 6144/25000 [======>.......................] - ETA: 1s - loss: 0.1949 - acc: 0.7074
 7168/25000 [=======>......................] - ETA: 1s - loss: 0.1950 - acc: 0.7055
 8192/25000 [========>.....................] - ETA: 1s - loss: 0.1952 - acc: 0.7051
 9216/25000 [==========>...................] - ETA: 1s - loss: 0.1960 - acc: 0.7024
10240/25000 [===========>..................] - ETA: 0s - loss: 0.1962 - acc: 0.7021
11264/25000 [============>.................] - ETA: 0s - loss: 0.1968 - acc: 0.6990
12288/25000 [=============>................] - ETA: 0s - loss: 0.1969 - acc: 0.6974
13312/25000 [==============>...............] - ETA: 0s - loss: 0.1967 - acc: 0.6983
14336/25000 [================>.............] - ETA: 0s - loss: 0.1964 - acc: 0.6989
15360/25000 [=================>............] - ETA: 0s - loss: 0.1963 - acc: 0.6997
16384/25000 [==================>...........] - ETA: 0s - loss: 0.1964 - acc: 0.6995
17408/25000 [===================>..........] - ETA: 0s - loss: 0.1964 - acc: 0.6994
18432/25000 [=====================>........] - ETA: 0s - loss: 0.1963 - acc: 0.7000
19456/25000 [======================>.......] - ETA: 0s - loss: 0.1966 - acc: 0.6984
20480/25000 [=======================>......] - ETA: 0s - loss: 0.1965 - acc: 0.6993
21504/25000 [========================>.....] - ETA: 0s - loss: 0.1967 - acc: 0.6982
22528/25000 [==========================>...] - ETA: 0s - loss: 0.1966 - acc: 0.6979
23552/25000 [===========================>..] - ETA: 0s - loss: 0.1970 - acc: 0.6965
24576/25000 [============================>.] - ETA: 0s - loss: 0.1970 - acc: 0.6973
25000/25000 [==============================] - 2s 63us/step - loss: 0.1968 - acc: 0.6976
Epoch 2/4

  512/25000 [..............................] - ETA: 1s - loss: 0.1962 - acc: 0.6855
 1536/25000 [>.............................] - ETA: 1s - loss: 0.1959 - acc: 0.7025
 2560/25000 [==>...........................] - ETA: 1s - loss: 0.1971 - acc: 0.7000
 3584/25000 [===>..........................] - ETA: 1s - loss: 0.1970 - acc: 0.6967
 4608/25000 [====>.........................] - ETA: 1s - loss: 0.1982 - acc: 0.6947
 5632/25000 [=====>........................] - ETA: 1s - loss: 0.1974 - acc: 0.6950
 6656/25000 [======>.......................] - ETA: 1s - loss: 0.1977 - acc: 0.6947
 7680/25000 [========>.....................] - ETA: 1s - loss: 0.1982 - acc: 0.6911
 8704/25000 [=========>....................] - ETA: 1s - loss: 0.1983 - acc: 0.6930
 9728/25000 [==========>...................] - ETA: 0s - loss: 0.1979 - acc: 0.6951
10752/25000 [===========>..................] - ETA: 0s - loss: 0.1972 - acc: 0.6983
11776/25000 [=============>................] - ETA: 0s - loss: 0.1973 - acc: 0.6964
12800/25000 [==============>...............] - ETA: 0s - loss: 0.1969 - acc: 0.6977
13824/25000 [===============>..............] - ETA: 0s - loss: 0.1966 - acc: 0.6970
14848/25000 [================>.............] - ETA: 0s - loss: 0.1963 - acc: 0.6966
15360/25000 [=================>............] - ETA: 0s - loss: 0.1964 - acc: 0.6965
16384/25000 [==================>...........] - ETA: 0s - loss: 0.1966 - acc: 0.6942
17408/25000 [===================>..........] - ETA: 0s - loss: 0.1960 - acc: 0.6965
18432/25000 [=====================>........] - ETA: 0s - loss: 0.1961 - acc: 0.6970
19456/25000 [======================>.......] - ETA: 0s - loss: 0.1963 - acc: 0.6968
20480/25000 [=======================>......] - ETA: 0s - loss: 0.1959 - acc: 0.6973
21504/25000 [========================>.....] - ETA: 0s - loss: 0.1959 - acc: 0.6980
22528/25000 [==========================>...] - ETA: 0s - loss: 0.1956 - acc: 0.6990
23552/25000 [===========================>..] - ETA: 0s - loss: 0.1954 - acc: 0.6999
24576/25000 [============================>.] - ETA: 0s - loss: 0.1952 - acc: 0.6999
25000/25000 [==============================] - 2s 64us/step - loss: 0.1951 - acc: 0.7002
Epoch 3/4

  512/25000 [..............................] - ETA: 1s - loss: 0.2007 - acc: 0.6699
 1536/25000 [>.............................] - ETA: 1s - loss: 0.1956 - acc: 0.6921
 2560/25000 [==>...........................] - ETA: 1s - loss: 0.1947 - acc: 0.6984
 3584/25000 [===>..........................] - ETA: 1s - loss: 0.1922 - acc: 0.7045
 4608/25000 [====>.........................] - ETA: 1s - loss: 0.1926 - acc: 0.7016
 5632/25000 [=====>........................] - ETA: 1s - loss: 0.1923 - acc: 0.7033
 6656/25000 [======>.......................] - ETA: 1s - loss: 0.1927 - acc: 0.7012
 7680/25000 [========>.....................] - ETA: 1s - loss: 0.1928 - acc: 0.7008
 8704/25000 [=========>....................] - ETA: 1s - loss: 0.1930 - acc: 0.7000
 9728/25000 [==========>...................] - ETA: 0s - loss: 0.1934 - acc: 0.6996
10752/25000 [===========>..................] - ETA: 0s - loss: 0.1934 - acc: 0.6987
11776/25000 [=============>................] - ETA: 0s - loss: 0.1931 - acc: 0.7002
12800/25000 [==============>...............] - ETA: 0s - loss: 0.1931 - acc: 0.7005
13824/25000 [===============>..............] - ETA: 0s - loss: 0.1932 - acc: 0.7000
14848/25000 [================>.............] - ETA: 0s - loss: 0.1932 - acc: 0.7012
15872/25000 [==================>...........] - ETA: 0s - loss: 0.1933 - acc: 0.7007
16896/25000 [===================>..........] - ETA: 0s - loss: 0.1932 - acc: 0.7010
17920/25000 [====================>.........] - ETA: 0s - loss: 0.1929 - acc: 0.7017
18944/25000 [=====================>........] - ETA: 0s - loss: 0.1929 - acc: 0.7019
19968/25000 [======================>.......] - ETA: 0s - loss: 0.1929 - acc: 0.7024
20992/25000 [========================>.....] - ETA: 0s - loss: 0.1926 - acc: 0.7037
22016/25000 [=========================>....] - ETA: 0s - loss: 0.1923 - acc: 0.7052
23040/25000 [==========================>...] - ETA: 0s - loss: 0.1924 - acc: 0.7049
24064/25000 [===========================>..] - ETA: 0s - loss: 0.1922 - acc: 0.7055
25000/25000 [==============================] - 2s 61us/step - loss: 0.1922 - acc: 0.7064
Epoch 4/4

  512/25000 [..............................] - ETA: 1s - loss: 0.1908 - acc: 0.7051
 1536/25000 [>.............................] - ETA: 1s - loss: 0.1930 - acc: 0.7083
 2560/25000 [==>...........................] - ETA: 1s - loss: 0.1923 - acc: 0.7082
 3584/25000 [===>..........................] - ETA: 1s - loss: 0.1906 - acc: 0.7121
 4608/25000 [====>.........................] - ETA: 1s - loss: 0.1910 - acc: 0.7092
 5632/25000 [=====>........................] - ETA: 1s - loss: 0.1912 - acc: 0.7076
 6656/25000 [======>.......................] - ETA: 1s - loss: 0.1915 - acc: 0.7057
 7680/25000 [========>.....................] - ETA: 1s - loss: 0.1917 - acc: 0.7056
 8704/25000 [=========>....................] - ETA: 1s - loss: 0.1914 - acc: 0.7051
 9728/25000 [==========>...................] - ETA: 0s - loss: 0.1911 - acc: 0.7073
10752/25000 [===========>..................] - ETA: 0s - loss: 0.1908 - acc: 0.7081
11776/25000 [=============>................] - ETA: 0s - loss: 0.1908 - acc: 0.7082
12800/25000 [==============>...............] - ETA: 1s - loss: 0.1915 - acc: 0.7063
13824/25000 [===============>..............] - ETA: 0s - loss: 0.1915 - acc: 0.7050
14848/25000 [================>.............] - ETA: 0s - loss: 0.1918 - acc: 0.7047
15872/25000 [==================>...........] - ETA: 0s - loss: 0.1915 - acc: 0.7050
16896/25000 [===================>..........] - ETA: 0s - loss: 0.1911 - acc: 0.7057
17920/25000 [====================>.........] - ETA: 0s - loss: 0.1911 - acc: 0.7060
18944/25000 [=====================>........] - ETA: 0s - loss: 0.1910 - acc: 0.7069
19968/25000 [======================>.......] - ETA: 0s - loss: 0.1907 - acc: 0.7078
20992/25000 [========================>.....] - ETA: 0s - loss: 0.1908 - acc: 0.7064
22016/25000 [=========================>....] - ETA: 0s - loss: 0.1910 - acc: 0.7063
23040/25000 [==========================>...] - ETA: 0s - loss: 0.1909 - acc: 0.7073
24064/25000 [===========================>..] - ETA: 0s - loss: 0.1909 - acc: 0.7074
25000/25000 [==============================] - 2s 73us/step - loss: 0.1912 - acc: 0.7069
results5 <- dropout_model4 %>% evaluate(x_test, y_test)

   32/25000 [..............................] - ETA: 1s
  832/25000 [..............................] - ETA: 1s
 1664/25000 [>.............................] - ETA: 1s
 2528/25000 [==>...........................] - ETA: 1s
 3424/25000 [===>..........................] - ETA: 1s
 4288/25000 [====>.........................] - ETA: 1s
 5184/25000 [=====>........................] - ETA: 1s
 5984/25000 [======>.......................] - ETA: 1s
 6848/25000 [=======>......................] - ETA: 1s
 7680/25000 [========>.....................] - ETA: 1s
 8512/25000 [=========>....................] - ETA: 0s
 9344/25000 [==========>...................] - ETA: 0s
10176/25000 [===========>..................] - ETA: 0s
11008/25000 [============>.................] - ETA: 0s
11872/25000 [=============>................] - ETA: 0s
12704/25000 [==============>...............] - ETA: 0s
13536/25000 [===============>..............] - ETA: 0s
14368/25000 [================>.............] - ETA: 0s
15232/25000 [=================>............] - ETA: 0s
16096/25000 [==================>...........] - ETA: 0s
16960/25000 [===================>..........] - ETA: 0s
17856/25000 [====================>.........] - ETA: 0s
18720/25000 [=====================>........] - ETA: 0s
19616/25000 [======================>.......] - ETA: 0s
20512/25000 [=======================>......] - ETA: 0s
21376/25000 [========================>.....] - ETA: 0s
22176/25000 [=========================>....] - ETA: 0s
23008/25000 [==========================>...] - ETA: 0s
23840/25000 [===========================>..] - ETA: 0s
24736/25000 [============================>.] - ETA: 0s
25000/25000 [==============================] - 1s 60us/step
results5
$loss
[1] 0.152684

$acc
[1] 0.8882

Comparison of Regularization and Dropout models : PLOT

compare_cx <- data.frame(
  model3_train = model3history$metrics$loss,
  model3_train_val = model3history$metrics$val_loss,
  dropout_model4_train = dropout_history4$metrics$loss,
 dropout_model4_val = dropout_history4$metrics$val_loss
) %>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)
  
p2 <- plot_ly(compare_cx,
             x = ~rowname,
             y = ~value,
             color = ~type,
             type = "scatter",
             mode = "lines") %>% 
  layout(title = "<b>Fig 2</b> Comparing Regularization and dropout model losses",
         xaxis = list(title = "Epochs"),
         yaxis = list(title = "Loss"))
p2
---
title: "Modifying IMDB example"
output: html_notebook
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```


In this notebook, we work on a dataset from IMDB site to classify movie reviews into "positive" reviews and "negative" reviews, just based on the text content of the reviews.
```{r}
library(readr)
library(tidyr)
library(tibble)
library(plotly)
```


## The IMDB dataset

The IMDB dataset , a set of 50,000 highly-polarized reviews from the Internet Movie Database. They are split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting in 50% negative and 50% positive reviews.
The IMDB dataset comes packaged with Keras. It has already been preprocessed: the reviews (sequences of words) have been turned into sequences of integers, where each integer stands for a specific word in a dictionary.

##Loading the dataset


```{r}
library(keras)
imdb <- dataset_imdb(num_words = 10000)
c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb
```


The argument `num_words = 10000` means that we will only keep the top 10,000 most frequently occurring words in the training data. Rare words will be discarded. This allows us to work with vector data of manageable size.

The variables `train_data` and `test_data` are lists of reviews, each review being a list of word indices (encoding a sequence of words). `train_labels` and `test_labels` are lists of 0s and 1s, where 0 stands for "negative" and 1 stands for "positive":
```{r}
train_labels[[1]]
```


Top 10,000 most frequent words are considered , no word index will exceed 10,000:
```{r}
max(sapply(train_data, max))
```


##  Data Preparation
* One-hot-encode method is used here so that the lists are converted to vectors of 0s and 1s.
For eg:This would turn the sequence `[3, 5]` into a 10,000-dimensional vector that would be all zeros except for indices 3 and 5, which would be ones. This would allowy the first layer in your network to be a dense layer, capable of handling floating-point vector data.
*vectorize the labels

```{r}
vectorize_sequences <- function(sequences, dimension = 10000) {
  # Creating all-zero matrix of shape (len(sequences), dimension)
  results <- matrix(0, nrow = length(sequences), ncol = dimension)
  for (i in 1:length(sequences))
    # Setting specific indices of results[i] to 1s
    results[i, sequences[[i]]] <- 1
  results
}
# vectorize training data
x_train <- vectorize_sequences(train_data)
# vectorize test data
x_test <- vectorize_sequences(test_data)
#vectorize labels
y_train <- as.numeric(train_labels)
y_test <- as.numeric(test_labels)
```

Data type of sample is now changed to numeric .
```{r}
str(x_train[1,])
```


## Building the Neural network
The choice of network for the first model of three layers is the 'relu' activation function 
 `layer_dense(units = 16, activation = "relu")`.
Each dense layer with a `relu` activation implements the following chain of tensor operations:
`output = relu(dot(W, input) + b)`

Having 16 hidden units means that the weight matrix `W` will have shape `(input_dimension, 16)`, i.e. the dot product with `W` will project the input data onto a 16-dimensional representation space (and then we would add the bias vector `b` and apply the `relu` operation). 
```{r}
library(keras)
firstmodel_3layer <- keras_model_sequential() %>% 
  layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>% 
  layer_dense(units = 16, activation = "relu") %>% 
  layer_dense(units = 1, activation = "sigmoid")
```


##Configure the model - optimiser selection
Lastly, we need to pick a loss function and an optimizer. crossentropy is usually the best choice when you are dealing with models that output probabilities and for binary classification probems. Crossentropy is a quantity from the field of Information Theory, that measures the "distance" between probability distributions, or in our case, between the ground-truth distribution and our predictions.

configuring our model with the `rmsprop` optimizer and the `binary_crossentropy` loss function. 
```{r}
firstmodel_3layer %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("accuracy")
)
```


## Validating our approach

In order to monitor during training the accuracy of the model on data that it has never seen before, we will create a "validation set" by setting apart 10,000 samples from the original training data:

```{r}
val_indices <- 1:10000

x_val <- x_train[val_indices,]
partial_x_train <- x_train[-val_indices,]

y_val <- y_train[val_indices]
partial_y_train <- y_train[-val_indices]
```

Training our model for 20 epochs (20 iterations over all samples in the `x_train` and `y_train` tensors), in mini-batches of 512 samples. At this same time we will monitor loss and accuracy on the 10,000 samples that we set apart. This is done by passing the validation data as the `validation_data` argument:
```{r}
firstmodel_3layer %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("accuracy")
)

history <- firstmodel_3layer %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val)
)

```
The call to `fit()` returns a `history` object. Let's take a look at it:
```{r}
str(history)
```

#Second model 
the Second model is for four layers and The number of hidden units of the layer is 32
```{r}
library(keras)
Secondmodel_4layer <- keras_model_sequential() %>% 
  layer_dense(units = 32, activation = "relu", input_shape = c(10000)) %>% 
  layer_dense(units = 32, activation = "relu") %>% 
  layer_dense(units = 32, activation = "relu") %>% 
  layer_dense(units = 1, activation = "sigmoid")
```

###Configure the model 
 
## Validation
```{r}
val_indices <- 1:10000

x_val2 <- x_train[val_indices,]
partial_x_train2 <- x_train[-val_indices,]

y_val2 <- y_train[val_indices]
partial_y_train2 <- y_train[-val_indices]
```

We will now train our model for 20 epochs ,batches = 512 samples,Loss function =MSE

```{r, echo=TRUE, results='hide'}
Secondmodel_4layer %>% compile(
  optimizer = "rmsprop",
  loss = "mse",
  metrics = c("accuracy")
)

Secondmodelhistory <- Secondmodel_4layer %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val2, y_val2)
)
```
##Comparing the two models by Plotting 

```{r}
compare_cx <- data.frame(
  firstmodel_3layer_train = history$metrics$loss,
  firstmodel_3layer_val = history$metrics$val_loss,
Secondmodel_4layer_train= Secondmodelhistory$metrics$loss,
Secondmodel_val = Secondmodelhistory$metrics$val_loss
) %>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)
  
p <- plot_ly(compare_cx,
             x = ~rowname,
             y = ~value,
             color = ~type,
             type = "scatter",
             mode = "lines") %>% 
  layout(title = "<b>Fig 1</b> Comparing model losses",
         xaxis = list(title = "Epochs"),
         yaxis = list(title = "Loss"))
p
```
#Results of both models on validation set :
```{r}
results2 <- Secondmodel_4layer %>% evaluate(x_val2, y_val2)
results <- firstmodel_3layer %>% evaluate(x_val, y_val)
results
results2

```

the accuracy of both models on validation set is nearly same, 86.5% and 86.6% 
##Performance on test set of both models:
```{r, echo=TRUE, results='hide'}
Secondmodel_4layer %>% fit(x_train, y_train, epochs = 20, batch_size = 512)
resultstest2 <- Secondmodel_4layer %>% evaluate(x_test, y_test)
resultstest2
firstmodel_3layer %>% fit(x_train, y_train, epochs = 20, batch_size = 512)
resultstest <- firstmodel_3layer %>% evaluate(x_test, y_test)
resultstest

```
The Accuracy in same for the test set is only 84.9% for first model and lower for sencond model 83.6% , this is because of overfitting of data while training ,
this could be taken care of with the help of regularization tecniques.Regularization is the process of modulating the quantity of information that a model is allowed to store or to add constraints on what information it’s allowed to store. 

Now, we will use two techniques  and Dropout

##1.Regularization 
Reducing networks size: The more capacity the network has, the more quickly it can model the training data, but the more susceptible it is to overfitting.Applying Weight Regularization with L2: The cost added to the loss function  is proportional to the absolute value of the weight coefficients

2 layer network , units = 16 ,loss = "binary_crossentropy"

```{r, echo=TRUE, results='hide'}
model3 <- 
  keras_model_sequential() %>%
  layer_dense(units = 16, activation = "relu", input_shape = c(10000),
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dense(units = 16, activation = "relu",
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dense(units = 1, activation = "sigmoid")

model3 %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = list("accuracy")
)

model3 %>% summary()

```

##Validation 
```{r, echo=TRUE, results='hide'}
model3history <-model3 %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val),
  verbose = 2
)

```

#Results :
```{r}
results3 <- model3 %>% evaluate(x_val, y_val)
results3
```
##1.1 Regularization
Applying Weight Regularization with L1: The cost added to the loss function  is proportional to the absolute value of the weight coefficients
```{r, echo=TRUE, results='hide'}
model3.1 <- 
  keras_model_sequential() %>%
  layer_dense(units = 16, activation = "relu", input_shape = c(10000),
              kernel_regularizer = regularizer_l1(l = 0.001)) %>%
  layer_dense(units = 16, activation = "relu",
              kernel_regularizer = regularizer_l1(l = 0.001)) %>%
  layer_dense(units = 1, activation = "sigmoid")

model3.1 %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = list("accuracy")
)

model3.1 %>% summary()

# validation 
model3.1history <-model3.1 %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val),
  verbose = 2
)

#Results of validation with l1
results3.1 <- model3.1 %>% evaluate(x_val, y_val)
results3.1
# the $loss is 
#[1] 0.4996383
#$acc
#[1] 0.8752
```

#model with regularizer_l1_l2 , loss function mse, activation = softmax
```{r, echo=TRUE, results='hide'}
model3.2 <- 
  keras_model_sequential() %>%
  layer_dense(units = 16, activation = "softmax", input_shape = c(10000),
              kernel_regularizer = regularizer_l1(l = 0.001)) %>%
  layer_dense(units = 16, activation = "softmax",
              kernel_regularizer = regularizer_l1(l = 0.001)) %>%
  layer_dense(units = 1, activation = "sigmoid")

model3.2 %>% compile(
  optimizer = "rmsprop",
  loss = "mse",
  metrics = list("accuracy")
)
model3.2 %>% summary()
# validation of model 3.2
model3.2history <-model3.2%>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val),
  verbose = 2
)

#Results of validation with l1
results3.2 <- model3.2 %>% evaluate(x_val, y_val)
results3.2

```

  

##2.Dropout
Dropout is one of the most effective and most commonly used regularization techniques for neural networks. 
The dropout rate is the fraction of the features that are zeroed out
```{r ,echo=TRUE, results='hide'}
dropout_model4 <- 
  keras_model_sequential() %>%
  layer_dense(units = 64, activation = "softmax", input_shape = c(10000)) %>%
  layer_dropout(0.6) %>%
  layer_dense(units = 16, activation = "softmax") %>%
  layer_dropout(0.6) %>%
  layer_dense(units = 1, activation = "sigmoid")

dropout_model4 %>% compile(
  optimizer = "adam",
  loss = "mse",
  metrics = list("accuracy")
)

dropout_model4 %>% summary()
```
##Validation -model
```{r,echo=TRUE, results='hide'}
dropout_history4 <- dropout_model4 %>% fit(
  partial_x_train,
  partial_y_train,
  epochs = 20,
  batch_size = 512,
  validation_data = list(x_val, y_val),
  verbose = 2
)

```
#Results :
```{r}
results4 <- dropout_model4 %>% evaluate(x_val, y_val)
results4
```
The accuracy on validation data is the best achieved with Dropout method 
*Optimiser = 'adam' ,   activation = relu,    acc=88.45 %  and loss=0.4157
*Optimiser = 'rmsprop'  activation = relu,    acc=87.4%    and loss=0.6751
*Optimiser = 'adam' ,   activation = Tanh,    acc=87.25%   and loss=0.5243
*Optimiser = 'adam' ,   activation = relu,  loss function = mse ,units = 32  ,acc=88.59%   and loss=0.0951
*Optimiser = 'adam' ,   activation = relu,  loss function = mse ,units = 64  ,acc=88.59%   and loss=0.0951
*Optimiser = 'adam' ,   activation = softmax,  loss function = mse ,units = 64  ,acc=89.02%   and loss=0.1686


#This last modification was most successful with highest accuracy 
```{r}

```

#let's check the model performance on test set 
```{r}
dropout_model4 %>% fit(x_train, y_train, epochs = 4, batch_size = 512)
results5 <- dropout_model4 %>% evaluate(x_test, y_test)
results5
#For configurtaion :*Optimiser = 'adam' ,   activation = softmax,  loss function = mse ,units = 64  ,acc=89.02%   and loss=0.1686
#*$loss = 0.152684 and $acc =0.8882  highest on test set 
```

##Comparison of Regularization and Dropout models : PLOT
```{r}
compare_cx <- data.frame(
  model3_train = model3history$metrics$loss,
  model3_train_val = model3history$metrics$val_loss,
  dropout_model4_train = dropout_history4$metrics$loss,
 dropout_model4_val = dropout_history4$metrics$val_loss
) %>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)
  
p2 <- plot_ly(compare_cx,
             x = ~rowname,
             y = ~value,
             color = ~type,
             type = "scatter",
             mode = "lines") %>% 
  layout(title = "<b>Fig 2</b> Comparing Regularization and dropout model losses",
         xaxis = list(title = "Epochs"),
         yaxis = list(title = "Loss"))
p2
```

